Skip to content

fix(claude): honor CLAUDE_BIN_PATH in dev mode for libc-mismatch hosts#1481

Merged
Wirasm merged 2 commits intodevfrom
fix/claude-bin-path-dev-mode
Apr 29, 2026
Merged

fix(claude): honor CLAUDE_BIN_PATH in dev mode for libc-mismatch hosts#1481
Wirasm merged 2 commits intodevfrom
fix/claude-bin-path-dev-mode

Conversation

@Wirasm
Copy link
Copy Markdown
Collaborator

@Wirasm Wirasm commented Apr 29, 2026

Summary

  • Problem: On glibc Linux hosts (Ubuntu/Debian/Fedora), archon workflow run fails immediately in dev mode (bun run dev:server). The Claude Agent SDK auto-resolves its bundled native binary in [linux-x64-musl, linux-x64] order, picks the musl variant first, but the musl ELF interpreter (/lib/ld-musl-x86_64.so.1) does not exist on glibc — spawn fails and the SDK reports a misleading "binary not found" error.
  • Why it matters: Affects every glibc Linux contributor running from source. The documented escape hatch CLAUDE_BIN_PATH is dead code in dev mode (early-returned before the env var is read), so the only workaround is patching node_modules.
  • What changed: Hoisted the CLAUDE_BIN_PATH env-var block in binary-resolver.ts above the BUNDLED_IS_BINARY early return. Updated binary-resolver-dev.test.ts to reflect the new contract.
  • What did NOT change (scope boundary): Config-file path (assistants.claude.claudeBinaryPath) is intentionally still binary-mode-only — it's a per-repo setting, not a per-machine escape hatch. Autodetect and the throw-with-install-instructions paths also stay binary-mode-only. Binary-mode behavior is byte-identical to before.

UX Journey

Before

glibc Linux user                Archon (dev mode)             Claude SDK
────────────────                ─────────────────             ──────────
bun run dev:server  ──────────▶ resolveClaudeBinaryPath()
                                  BUNDLED_IS_BINARY=false
                                  → returns undefined
                                pathToClaudeCodeExecutable    auto-resolves
                                  omitted                     [musl, glibc]
                                                              picks musl first
                                                              spawn fails (no
                                                              /lib/ld-musl-...)
                                                              ◀── ENOENT
sees "binary not found" ◀────── error surfaced verbatim

Setting CLAUDE_BIN_PATH=/home/me/.local/bin/claude does NOTHING:
  resolveClaudeBinaryPath() still returns undefined before reading env.

After

glibc Linux user                Archon (dev mode)             Claude SDK
────────────────                ─────────────────             ──────────
export CLAUDE_BIN_PATH=         [reads env first]
  ~/.local/bin/claude
bun run dev:server  ──────────▶ resolveClaudeBinaryPath()
                                  *env set + file exists*
                                  → returns env path
                                pathToClaudeCodeExecutable=   spawn succeeds
                                  /home/me/.local/bin/claude  ──────────────▶
                                                              ◀── responds
sees response  ◀──────────────  streams to platform

env unset                  →  identical to today (returns undefined in dev)
env set + file missing     →  clear, actionable error pointing at the env path

Architecture Diagram

Before

┌────────────────────┐       ┌────────────────────────┐       ┌──────────────────┐
│ ClaudeProvider     │ ────▶ │ resolveClaudeBinaryPath│ ────▶ │ @anthropic-ai/   │
│ provider.ts        │       │ binary-resolver.ts     │       │ claude-agent-sdk │
└────────────────────┘       │   ┌──────────────────┐ │       │  (auto-resolve)  │
                             │   │ if (!BINARY)     │ │       └──────────────────┘
                             │   │   return undef ◀─┼─┼── env var unread in dev
                             │   │ env path?        │ │
                             │   │ config path?     │ │
                             │   │ autodetect?      │ │
                             │   └──────────────────┘ │
                             └────────────────────────┘

After

┌────────────────────┐       ┌────────────────────────┐       ┌──────────────────┐
│ ClaudeProvider     │ ────▶ │ resolveClaudeBinaryPath│ ====▶ │ @anthropic-ai/   │
│ provider.ts        │       │ binary-resolver.ts [~] │       │ claude-agent-sdk │
└────────────────────┘       │   ┌──────────────────┐ │       │  (auto-resolve)  │
                             │   │ env path? [+]    │ │       └──────────────────┘
                             │   │ if (!BINARY)     │ │            ▲
                             │   │   return undef   │ │            │
                             │   │ config path?     │ │            │
                             │   │ autodetect?      │ │            │
                             │   └──────────────────┘ │            │
                             └────────────────────────┘            │
                                       ║                           │
                                       ╚═══ env path takes ════════╝
                                           precedence in both modes

Connection inventory:

From To Status Notes
ClaudeProvider resolveClaudeBinaryPath unchanged Same call site
resolveClaudeBinaryPath Claude SDK auto-resolve modified Now bypassed in dev when env var is set
process.env.CLAUDE_BIN_PATH resolveClaudeBinaryPath modified Read in both modes (was binary-only)

Label Snapshot

  • Risk: risk: low
  • Size: size: XS
  • Scope: core (closest available; lives in @archon/providers)
  • Module: providers:claude-binary-resolver

Change Metadata

  • Change type: bug
  • Primary scope: core

Linked Issue

Validation Evidence (required)

$ bun test packages/providers/src/claude/binary-resolver-dev.test.ts
 5 pass / 0 fail / 5 expect() calls (Ran 5 tests across 1 file)

$ bun test packages/providers/src/claude/binary-resolver.test.ts
 9 pass / 0 fail / 16 expect() calls (Ran 9 tests across 1 file)

$ bun run validate
exit 0  (check:bundled, type-check, lint, format:check, tests — all green)
  • Evidence provided: All five bun run validate gates exit 0; both resolver test files pass (5 dev-mode + 9 binary-mode).
  • If any command is intentionally skipped, explain why: None skipped.

Security Impact (required)

  • New permissions/capabilities? No (env-var override already existed in binary mode; this only enables it in dev mode)
  • New external network calls? No
  • Secrets/tokens handling changed? No
  • File system access scope changed? No (still reads exactly the path the operator sets)
  • If any Yes, describe risk and mitigation: N/A

Compatibility / Migration

  • Backward compatible? Yes (env unset → byte-identical to today; env set was previously ignored in dev, now honored — additive)
  • Config/env changes? No (CLAUDE_BIN_PATH is documented; semantics broaden only)
  • Database migration needed? No
  • If yes, exact upgrade steps: N/A

Human Verification (required)

  • Verified scenarios:
    • Dev mode, env unset → resolver returns undefined (existing test)
    • Dev mode, env set + file exists → resolver returns env path (new test)
    • Dev mode, env set + file missing → resolver throws with actionable message naming the bad path (new test)
    • Dev mode, env set + config path also set → env wins (new test)
    • Dev mode, only config path set → config still ignored (new test, preserves old behavior)
    • Binary mode behavior unchanged (existing 9 tests still pass)
  • Edge cases checked: env-var precedence over config in dev; no-op when env is unset (no surprise behavior change for current Mac/Windows users or CI).
  • What was not verified: I did not reproduce the original musl-loader spawn failure locally (Mac host, glibc-only failure mode); the resolver-level fix is verified against the documented contract that the issue reporter validated end-to-end on Ubuntu glibc 2.33.

Side Effects / Blast Radius (required)

  • Affected subsystems/workflows: Only Claude provider binary resolution. Codex resolver is independent and untouched.
  • Potential unintended effects: A user who has CLAUDE_BIN_PATH exported globally (e.g. for binary builds) and then runs dev mode will now have that path used instead of the SDK's bundled binary. This is the desired behavior (and matches what the env var name implies), but it is a behavior change for anyone who had the env var set without realizing dev mode was ignoring it. If the path is valid this is harmless; if the path is stale they'll see a clear error pointing at the path.
  • Guardrails/monitoring for early detection: The error path includes the exact env-var value in its message ("CLAUDE_BIN_PATH is set to '' but the file does not exist"), making misconfigurations self-diagnosing.

Rollback Plan (required)

  • Fast rollback command/path: git revert bf8dabbe — single-commit revert restores prior behavior.
  • Feature flags or config toggles: None needed; users can simply unset CLAUDE_BIN_PATH to opt out without code changes.
  • Observable failure symptoms: If the fix regresses, glibc dev users would see the original "musl binary not found" error again. Mac/Windows/binary users would see no behavior difference whatsoever.

Risks and Mitigations

  • Risk: A user with a stale CLAUDE_BIN_PATH set in their shell may now hit the resolver-level error in dev mode where they previously didn't.
    • Mitigation: The error message names the exact path that's bad and tells them how to fix it (verify path / unset env var). Lower-friction than the original musl-loader silent failure on glibc.
  • Risk: Future SDK upgrades change the auto-resolution order in a way that makes the env-var workaround unnecessary.
    • Mitigation: This change is purely additive — when the env var is unset, behavior is byte-identical to today, so the workaround can stay forever without cost or be removed in a follow-up.

Summary by CodeRabbit

  • Bug Fixes

    • CLAUDE_BIN_PATH is now honored in dev mode; invalid paths produce clear errors and precedence rules are clarified.
  • Tests

    • Expanded tests for env-var handling, precedence, and unset/invalid scenarios with reliable env setup/teardown.
  • Documentation

    • Clarified dev vs binary-mode resolution and when to use the env var versus the config field.
  • Chores

    • Updated changelog entry describing the fix.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 29, 2026

📝 Walkthrough

Walkthrough

Checks CLAUDE_BIN_PATH before the dev-mode early return so the env var is honored in both dev and binary modes; updates dev-mode tests, docs, and changelog to reflect env-first resolution while keeping config path binary-only.

Changes

Cohort / File(s) Summary
Binary Resolution Logic
packages/providers/src/claude/binary-resolver.ts
Move env-var check (CLAUDE_BIN_PATH) above the BUNDLED_IS_BINARY early return. Env-provided path is validated and returned with source: 'env' semantics (or an error thrown); config path remains binary-mode-only.
Dev-mode Tests
packages/providers/src/claude/binary-resolver-dev.test.ts
Update tests to spy on fileExists, manage process.env.CLAUDE_BIN_PATH via before/after hooks, and assert: unset behavior, config-only ignored in dev, env-var success, env-var missing-file error, env-over-config precedence, and empty-string treated as unset.
Documentation
packages/docs-web/src/content/docs/getting-started/ai-assistants.md
Add guidance that CLAUDE_BIN_PATH can override binary resolution in dev/source and binary modes; note claudeBinaryPath config is binary-mode-only.
Changelog
CHANGELOG.md
Add "Fixed" entry documenting that CLAUDE_BIN_PATH is now honored in dev mode and that config path remains binary-only; mention unchanged env-loading/.env isolation behavior.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant Resolver
  participant Env as Environment
  participant FS as FileSystem
  participant Config as Config/Bundled

  Caller->>Resolver: resolveClaudeBinaryPath(config?)
  Resolver->>Env: read CLAUDE_BIN_PATH
  Env-->>Resolver: envPath (set / unset / empty)
  alt envPath set and non-empty
    Resolver->>FS: fileExists(envPath)
    FS-->>Resolver: true
    Resolver-->>Caller: return envPath
    else file missing
    FS-->>Resolver: false
    Resolver-->>Caller: throw "CLAUDE_BIN_PATH is set but the file does not exist"
  else envPath unset or empty
    Resolver->>Resolver: if !BUNDLED_IS_BINARY -> return undefined (dev)
    alt BUNDLED_IS_BINARY true
      Resolver->>Config: check configClaudeBinaryPath
      Config-->>Resolver: configPath (or none)
      alt configPath set
        Resolver->>FS: fileExists(configPath)
        FS-->>Resolver: true/false
        Resolver-->>Caller: return configPath or throw
      else
        Resolver->>Config: autodetect bundled binary
        Config-->>Resolver: detectedPath or none
        Resolver-->>Caller: return detectedPath or throw install instructions
      end
    end
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰 A hop and a tweak, a tiny patch done,
CLAUDE now hears the env under sun.
When musl trips and glibc sighs,
One var clears the cloudy skies.
The rabbit nibbles bugs away — hooray! 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: honoring CLAUDE_BIN_PATH in dev mode to fix a bug affecting glibc Linux users.
Description check ✅ Passed The PR description comprehensively follows the template with all major sections completed: summary, UX journey (before/after), architecture diagrams, labels, validation evidence, security/compatibility/rollback analysis, and human verification details.
Linked Issues check ✅ Passed All primary objectives from issue #1474 are met: the CLAUDE_BIN_PATH env var is now honored in dev mode, clear errors are thrown when the path is missing, config-file path remains binary-mode-only, and tests assert the new contract.
Out of Scope Changes check ✅ Passed All changes are tightly scoped to the binary resolver and its tests. Documentation updates (docs-web, CHANGELOG) directly support the fix. No unrelated or scope-creeping modifications present.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/claude-bin-path-dev-mode

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 6/8 reviews remaining, refill in 8 minutes and 2 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

@Wirasm
Copy link
Copy Markdown
Collaborator Author

Wirasm commented Apr 29, 2026

Review Summary

Verdict: minor-fixes-needed

Your fix correctly hoists CLAUDE_BIN_PATH above the BUNDLED_IS_BINARY early-return, enabling the escape hatch on Linux hosts where the SDK picks the wrong platform variant. The code quality, error handling, and test coverage are solid. What's blocking merge: three docs pages still describe dev mode as "auto-resolves via node_modules only" and don't mention that CLAUDE_BIN_PATH now works — users won't know the feature exists.

Blocking issues

  • packages/docs-web/src/content/docs/getting-started/ai-assistants.md: Add a note that CLAUDE_BIN_PATH can override the dev-mode auto-resolution. Example: "You can override with CLAUDE_BIN_PATH if the SDK picks the wrong platform binary (e.g. musl vs glibc on Linux)."
  • packages/docs-web/src/content/docs/getting-started/configuration.md: Same update in the dev-mode configuration section.
  • packages/docs-web/src/content/docs/reference/troubleshooting.md: Update "unaffected" language — CLAUDE_BIN_PATH can now override in dev mode, but config-file path remains binary-mode-only.

Suggested fixes

  • CHANGELOG.md: Add entry under [Unreleased] describing that CLAUDE_BIN_PATH is now honored in dev mode (previously ignored). Note that assistants.claude.claudeBinaryPath in config remains dev-mode-only.
  • binary-resolver.ts:70: Add a test for empty-string CLAUDE_BIN_PATH ('') to verify it falls through to undefined.

Minor / nice-to-have

  • binary-resolver-dev.test.ts:38: For consistency with the binary-mode test file, drop | undefined from fileExistsSpy type and the ? on mockRestore().
  • binary-resolver-dev.test.ts:3-9 and binary-resolver.ts:59-61: The detailed glibc/musl failure description is valuable but risks drifting. Trim to a one-line "escape hatch for SDK binary resolution mismatches" framing.

Compliments

  • The PR body is thorough — architecture diagrams, actual test output, security/compatibility/rollback analysis, and human verification all included. Easy to review.
  • The test throws when CLAUDE_BIN_PATH is set but file does not exist in dev mode directly covers the fail-fast improvement.
  • The comment "Config-file path is intentionally NOT honored in dev mode" in the test file captures a policy decision that would otherwise confuse future maintainers.

Reviewed via maintainer-review-pr workflow (Pi/Minimax). Aspects run: code-review, error-handling, test-coverage, comment-quality, docs-impact.

Wirasm added a commit that referenced this pull request Apr 29, 2026
- CHANGELOG entry under [Unreleased] / Fixed describing the dev-mode
  CLAUDE_BIN_PATH escape hatch (previously ignored). Notes that
  config-file path remains binary-mode-only and that env-loading +
  target-repo .env isolation are unchanged downstream.
- Empty-string test pinning that CLAUDE_BIN_PATH='' falls through
  to undefined rather than throwing — protects against a future
  predicate typo that would treat empty as "set".
- One-line note in ai-assistants.md "Binary path configuration"
  section pointing dev-mode users at the env-var override for the
  glibc/musl mismatch case.

Skipped from the review:
- The other two docs-page rewrites (configuration.md /
  troubleshooting.md): the error message itself names CLAUDE_BIN_PATH,
  and #1474 documents the use case publicly. One mention in
  ai-assistants.md is enough for discovery.
- Type-style consistency tweaks in the test file: pure bikeshed.
Wirasm added 2 commits April 29, 2026 12:36
The Claude Agent SDK auto-resolves its bundled native binary in
[linux-x64-musl, linux-x64] order. On glibc Linux hosts (Ubuntu/Debian/
Fedora), Bun installs both via optionalDependencies and the musl variant
is picked first; its ELF interpreter (/lib/ld-musl-x86_64.so.1) does not
exist on glibc, so spawn fails and the SDK reports a misleading "binary
not found" — the file is on disk, the loader is not.

The documented escape hatch CLAUDE_BIN_PATH was dead code in dev mode:
the resolver early-returned undefined when BUNDLED_IS_BINARY=false before
ever reading the env var. The only workaround was patching node_modules.

Move the env-var block above the BUNDLED_IS_BINARY return. Config-file
path stays binary-mode-only — it's per-repo, not per-machine; env is the
right knob for libc mismatches.

Behavior preserved:
- env unset                  → unchanged (undefined in dev, autodetect/throw in binary)
- env set + file exists      → resolved (was binary-only; now also dev)
- env set + file missing     → clear error (was binary-only; now also dev)

Closes #1474
- CHANGELOG entry under [Unreleased] / Fixed describing the dev-mode
  CLAUDE_BIN_PATH escape hatch (previously ignored). Notes that
  config-file path remains binary-mode-only and that env-loading +
  target-repo .env isolation are unchanged downstream.
- Empty-string test pinning that CLAUDE_BIN_PATH='' falls through
  to undefined rather than throwing — protects against a future
  predicate typo that would treat empty as "set".
- One-line note in ai-assistants.md "Binary path configuration"
  section pointing dev-mode users at the env-var override for the
  glibc/musl mismatch case.

Skipped from the review:
- The other two docs-page rewrites (configuration.md /
  troubleshooting.md): the error message itself names CLAUDE_BIN_PATH,
  and #1474 documents the use case publicly. One mention in
  ai-assistants.md is enough for discovery.
- Type-style consistency tweaks in the test file: pure bikeshed.
@Wirasm Wirasm force-pushed the fix/claude-bin-path-dev-mode branch from 6e67d8a to 8188492 Compare April 29, 2026 09:37
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
packages/providers/src/claude/binary-resolver-dev.test.ts (1)

23-39: Prefer per-test spy teardown (afterEach or test-local try/finally) to avoid leak-on-failure.

Current cleanup is good, but moving spy restore to per-test teardown makes isolation more deterministic if a test exits early.

Refactor sketch
-import { describe, test, expect, mock, beforeEach, afterAll, spyOn } from 'bun:test';
+import { describe, test, expect, mock, beforeEach, afterEach, afterAll, spyOn } from 'bun:test';
@@
   beforeEach(() => {
     delete process.env.CLAUDE_BIN_PATH;
-    fileExistsSpy?.mockRestore();
-    fileExistsSpy = undefined;
   });
+
+  afterEach(() => {
+    fileExistsSpy?.mockRestore();
+    fileExistsSpy = undefined;
+  });

Also applies to: 51-74

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/providers/src/claude/binary-resolver-dev.test.ts` around lines 23 -
39, The tests currently restore the fileExistsSpy only in beforeEach and
afterAll which can leak if a test fails early; change cleanup to per-test
teardown by moving fileExistsSpy?.mockRestore() into an afterEach (or wrap each
test body with try/finally to restore) so each test restores fileExistsSpy
immediately after it runs; update the same pattern in the other block referenced
(lines 51-74) so both occurrences of the fileExistsSpy mock are restored in
afterEach (or via test-local try/finally) and set fileExistsSpy = undefined in
that afterEach as well.
packages/providers/src/claude/binary-resolver.ts (1)

5-12: Consider consolidating the glibc/musl rationale to one concise comment.

The same platform-mismatch explanation is duplicated in two comment blocks; keeping one short canonical note will reduce documentation drift.

Also applies to: 71-73

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/providers/src/claude/binary-resolver.ts` around lines 5 - 12,
Consolidate the duplicated glibc/musl platform-mismatch rationale into a single
concise comment near the resolution order for pathToClaudeCodeExecutable: remove
the repeated paragraph (the one also present around lines 71-73) and replace
with one short canonical note that mentions the platform mismatch concern and
that CLAUDE_BIN_PATH can override the auto-resolved per-platform binary; update
the comment around the resolution list (referencing CLAUDE_BIN_PATH and
pathToClaudeCodeExecutable) so it is the single source of truth.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/providers/src/claude/binary-resolver-dev.test.ts`:
- Around line 23-39: The tests currently restore the fileExistsSpy only in
beforeEach and afterAll which can leak if a test fails early; change cleanup to
per-test teardown by moving fileExistsSpy?.mockRestore() into an afterEach (or
wrap each test body with try/finally to restore) so each test restores
fileExistsSpy immediately after it runs; update the same pattern in the other
block referenced (lines 51-74) so both occurrences of the fileExistsSpy mock are
restored in afterEach (or via test-local try/finally) and set fileExistsSpy =
undefined in that afterEach as well.

In `@packages/providers/src/claude/binary-resolver.ts`:
- Around line 5-12: Consolidate the duplicated glibc/musl platform-mismatch
rationale into a single concise comment near the resolution order for
pathToClaudeCodeExecutable: remove the repeated paragraph (the one also present
around lines 71-73) and replace with one short canonical note that mentions the
platform mismatch concern and that CLAUDE_BIN_PATH can override the
auto-resolved per-platform binary; update the comment around the resolution list
(referencing CLAUDE_BIN_PATH and pathToClaudeCodeExecutable) so it is the single
source of truth.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a4d22e1e-d5bc-4052-942a-8a1cbcb13953

📥 Commits

Reviewing files that changed from the base of the PR and between 6e67d8a and 8188492.

📒 Files selected for processing (4)
  • CHANGELOG.md
  • packages/docs-web/src/content/docs/getting-started/ai-assistants.md
  • packages/providers/src/claude/binary-resolver-dev.test.ts
  • packages/providers/src/claude/binary-resolver.ts
✅ Files skipped from review due to trivial changes (1)
  • packages/docs-web/src/content/docs/getting-started/ai-assistants.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • CHANGELOG.md

@Wirasm Wirasm merged commit 4885ee6 into dev Apr 29, 2026
4 checks passed
@Wirasm Wirasm deleted the fix/claude-bin-path-dev-mode branch April 29, 2026 09:43
@Wirasm Wirasm mentioned this pull request Apr 29, 2026
Wirasm added a commit that referenced this pull request Apr 29, 2026
Regression from #1481 (honor CLAUDE_BIN_PATH in dev mode). The CI workflow
set `CLAUDE_BIN_PATH: ~/.local/bin/claude` in YAML `env:` blocks; YAML does
not expand `~`, so the literal string was passed to the resolver.

Before #1481, dev mode silently ignored the env var and the SDK
auto-resolved its bundled binary — so the broken value was harmless. After
#1481, dev mode honors it, the file-existence check fails on the literal
`~`, and the smoke job aborts with "CLAUDE_BIN_PATH is set ... but the file
does not exist".

Move the env-var assignment into the run-step shell where `$HOME` resolves.
Both e2e-claude and e2e-mixed-providers jobs are affected.
prospapledge88 added a commit to prospapledge88/Archon that referenced this pull request May 5, 2026
* fix(core/test): split connection.test.ts from DB-test batch to avoid mock pollution (coleam00#1269)

messages.test.ts uses mock.module('./connection', ...) at module-load time.
Per CLAUDE.md:131 (Bun issue oven-sh/bun#7823), mock.module() is process-
global and irreversible. When Bun pre-loads all test files in a batch, the
mock shadows the real connection module before connection.test.ts runs,
causing getDatabaseType() to always return the mocked value regardless of
DATABASE_URL.

Move connection.test.ts into its own `bun test` invocation immediately
after postgres.test.ts (which runs alone) and before the big DB/utils/
config/state batch that contains messages.test.ts. This follows the same
isolation pattern already used for command-handler, clone, postgres, and
path-validation tests.

* fix(setup): align PORT default on 3090 across .env.example, wizard, and JSDoc (coleam00#1152) (coleam00#1271)

The server's getPort() fallback changed from 3000 to 3090 in the Hono
migration (coleam00#318), but .env.example, the setup wizard's generated .env,
and the JSDoc describing the fallback were not updated — leaving three
different sources of truth for "the default PORT."

When the wizard writes PORT=3000 to ~/.archon/.env (which the Hono
server loads with override: true, while Vite only reads repo-local
.env), the two processes can land on different ports silently. That
mismatch is the real mechanism behind the failure described in coleam00#1152.

- .env.example: comment out PORT, document 3090 as the default
- packages/cli/src/commands/setup.ts: wizard no longer writes PORT=3000
  into the generated .env; fix the "Additional Options" note
- packages/cli/src/commands/setup.test.ts: assert no bare PORT= line and
  the commented default is present
- packages/core/src/utils/port-allocation.ts: fix stale JSDoc "default
  3000" -> "default 3090"
- deploy/.env.example: keep Docker default at 3000 (compose/Caddy target
  that) but annotate it so users don't copy it for local dev

Single source of truth for the local-dev default is now basePort in
port-allocation.ts.

* fix(providers/claude): use || instead of ?? in hasExplicitTokens to handle empty-string env vars (coleam00#1028)

Closes coleam00#1027

* chore(deps): bump claude-agent-sdk to 0.2.121, codex-sdk to 0.125.0 (coleam00#1460)

Both SDKs were ~30 patch releases behind. Validation suite passes
(type-check, lint, format, tests across all 10 packages) without code
changes. The only sustained Claude SDK behavior change in the range —
v0.2.111's options.env overlay/replace flap, since reverted to overlay —
is a no-op for Archon, which already passes { ...process.env } as the
SDK env.

* fix(cli): lazy-import bundled skill files so non-setup commands don't crash on missing source (coleam00#1394)

The 18 top-level `import … with { type: 'text' }` statements in
`bundled-skill.ts` resolve at module load. For `bun build --compile` that's
build time, so the binary embeds the strings and works regardless of any
on-disk skill files. For `bun link` (linked-source) installs that's every
`archon` invocation — including `archon --help`, which doesn't even use the
skill content. If any of the 18 source files are missing or moved, the
import fails and the CLI cannot start at all.

The skill content is data the binary deploys via `archon setup`, not data
the CLI needs at runtime. There's only one consumer in production code:
`copyArchonSkill()` in `setup.ts`. Moving the import into that function as
a dynamic import preserves the compiled-binary behavior (Bun's bundler
statically analyses literal-string `import()` and embeds the chunk —
verified by grepping the SKILL.md frontmatter out of a freshly compiled
binary) while making the linked-source install resilient: only `archon
setup` triggers the bundled-skill module load now. Verified: a known skill
string appears in the compiled binary 1×, and `archon --help` no longer
needs the source files to start.

`copyArchonSkill()` becomes async because the dynamic import is a Promise.
The single production call site is already in an async function and gets
an `await`. The four `setup.test.ts` cases become async too.

* fix(claude): stop passing --no-env-file to native binary in dev mode (coleam00#1461)

* fix(claude): stop passing --no-env-file to native binary in dev mode

The Claude Agent SDK switched from shipping `cli.js` inside the package
to per-platform native binaries via optional deps somewhere in the
0.2.x series. As of 0.2.121 there is no `cli.js` in the SDK package;
dev mode resolves to `@anthropic-ai/claude-agent-sdk-darwin-arm64/claude`
(Mach-O). That native binary rejects `--no-env-file` with
`error: unknown option '--no-env-file'` and the subprocess exits 1.

`shouldPassNoEnvFile` was returning true on `cliPath === undefined` on
the assumption that "dev mode = JS executable run via Bun". That
assumption is dead. Tighten the predicate to only return true on an
explicit `.js` suffix, so we only emit the flag when the SDK is going
to spawn a Bun-runnable script.

CWD `.env` leak protection is unaffected. `stripCwdEnv()` in
`@archon/paths` (coleam00#1067) deletes Bun-auto-loaded `.env`/`.env.local`/
`.env.development`/`.env.production` keys from `process.env` at every
Archon entry point before any subprocess is spawned. The native Claude
binary does not auto-load `.env` from its cwd either. `--no-env-file`
was belt-and-suspenders for the JS-via-Bun case only.

Verified end-to-end with a sentinel: added a unique
`ARCHON_LEAK_SENTINEL_$$` to Archon's `.env`, ran e2e-claude-smoke
with a bash probe checking the subprocess env. stderr shows
`[archon] stripped 23 keys from /Users/rasmus/Projects/cole/Archon
(.env, .env.local)` — sentinel was deleted. Bash node prints
`PASS: simple='4', no sentinel leak`. Workflow completes cleanly,
no `--no-env-file` rejection from the SDK binary.

bun run validate: green across all 10 packages.

* fix(claude): address review on coleam00#1461 (stale docs + test gaps)

Critical: file-level JSDoc at provider.ts:18 still claimed dev mode
resolves cli.js. Updated to reflect SDK 0.2.x's switch to per-platform
native binaries.

Important: security.md still listed --no-env-file as item 2 of
target-repo .env isolation. Scoped that bullet to legacy
Bun-runnable JS entry points and called out that native binaries
don't auto-load .env from cwd. Added an Unreleased Fixed entry to
CHANGELOG.md. Updated binary-resolver.ts JSDoc title that referenced
cli.js.

Polish: widened the predicate to accept .mjs and .cjs (also
Bun-runnable JS — matches the SDK's own internal extension list).
Dropped the redundant `passesNoEnvFile` log field that mirrored
`isJsExecutable`. Added unit cases for .mjs/.cjs (now true) and
.ts/.tsx/.jsx (deliberately false — never SDK entry points).

Added an integration test that mocks resolveClaudeBinaryPath to
return a .js path and asserts executableArgs: ['--no-env-file']
flows through buildBaseClaudeOptions all the way to the SDK call —
catches future regressions in the conditional spread.

bun run validate: green across all 10 packages.

* fix(orchestrator): clear stale session ID on error_during_execution to prevent infinite failure loop (coleam00#1294)

* fix(orchestrator): clear stale session ID on error_during_execution to prevent infinite failure loop

When a Claude API session expires (e.g. after container restart), the orchestrator
persists the new (failed) session ID from the error result, causing every subsequent
message in that conversation to hit the same error — an infinite failure loop.

Fix: on error_during_execution result, set assistant_session_id to NULL instead of
persisting the failed session ID. The next message starts a fresh session with full
context rebuilt from the DB. Conversation history is unaffected since it lives in
remote_agent_messages, independent of the Claude session.

Changes:
- updateSession() and tryPersistSessionId() now accept string | null
- Both handleStreamMode and handleBatchMode clear session ID on error_during_execution

Fixes coleam00#1280

* test(orchestrator): add stale session clearing tests + address review feedback

Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>
Signed-off-by: kagura-agent <kagura.agent.ai@gmail.com>

---------

Signed-off-by: kagura-agent <kagura.agent.ai@gmail.com>
Co-authored-by: Claude Opus 4 (1M context) <noreply@anthropic.com>

* fix(claude): honor CLAUDE_BIN_PATH in dev mode for libc-mismatch hosts (coleam00#1481)

* fix(claude): honor CLAUDE_BIN_PATH in dev mode for libc-mismatch hosts

The Claude Agent SDK auto-resolves its bundled native binary in
[linux-x64-musl, linux-x64] order. On glibc Linux hosts (Ubuntu/Debian/
Fedora), Bun installs both via optionalDependencies and the musl variant
is picked first; its ELF interpreter (/lib/ld-musl-x86_64.so.1) does not
exist on glibc, so spawn fails and the SDK reports a misleading "binary
not found" — the file is on disk, the loader is not.

The documented escape hatch CLAUDE_BIN_PATH was dead code in dev mode:
the resolver early-returned undefined when BUNDLED_IS_BINARY=false before
ever reading the env var. The only workaround was patching node_modules.

Move the env-var block above the BUNDLED_IS_BINARY return. Config-file
path stays binary-mode-only — it's per-repo, not per-machine; env is the
right knob for libc mismatches.

Behavior preserved:
- env unset                  → unchanged (undefined in dev, autodetect/throw in binary)
- env set + file exists      → resolved (was binary-only; now also dev)
- env set + file missing     → clear error (was binary-only; now also dev)

Closes coleam00#1474

* chore(claude): address CodeRabbit review on coleam00#1481

- CHANGELOG entry under [Unreleased] / Fixed describing the dev-mode
  CLAUDE_BIN_PATH escape hatch (previously ignored). Notes that
  config-file path remains binary-mode-only and that env-loading +
  target-repo .env isolation are unchanged downstream.
- Empty-string test pinning that CLAUDE_BIN_PATH='' falls through
  to undefined rather than throwing — protects against a future
  predicate typo that would treat empty as "set".
- One-line note in ai-assistants.md "Binary path configuration"
  section pointing dev-mode users at the env-var override for the
  glibc/musl mismatch case.

Skipped from the review:
- The other two docs-page rewrites (configuration.md /
  troubleshooting.md): the error message itself names CLAUDE_BIN_PATH,
  and coleam00#1474 documents the use case publicly. One mention in
  ai-assistants.md is enough for discovery.
- Type-style consistency tweaks in the test file: pure bikeshed.

* fix(deps): bump hono to ^4.12.16 and @hono/node-server to ^1.19.13 (closes coleam00#1484) (coleam00#1499)

* fix(orchestrator): create ~/.archon/workspaces before AI provider spawn (coleam00#1529)

* fix(orchestrator): create ~/.archon/workspaces before AI provider spawn

On a fresh install, ~/.archon/workspaces doesn't exist yet. The
orchestrator passes that path as cwd to the AI provider, which calls
spawn() — which raises ENOENT. The error is then misclassified as
"binary not found" in the friendly-error path, surfacing as an
incorrect "Claude binary not found" message.

Add ensureArchonWorkspacesPath() in @archon/paths that mkdir -p's the
directory and returns the path. Use it at the orchestrator's spawn-cwd
site so the directory is guaranteed to exist before spawn().

Other call sites of getArchonWorkspacesPath() (workflow discovery,
path-prefix comparisons) only consume the path string and don't need
the directory to exist; they keep using the pure getter.

Closes coleam00#1528

* test(orchestrator): assert ensureArchonWorkspacesPath is called

Capture the @archon/paths mock as a named variable and assert it was
called in the syncWorkspace handleMessage path. Without this, the test
suite passes even if orchestrator-agent.ts:824 reverts to the
non-ensuring getArchonWorkspacesPath() variant — exactly the regression
that surfaced as 'Claude Code native binary not found' in coleam00#1528.

* docs(changelog): add Tier 1 batch 2 cherry-pick entry

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Signed-off-by: kagura-agent <kagura.agent.ai@gmail.com>
Co-authored-by: Rasmus Widing <152263317+Wirasm@users.noreply.github.com>
Co-authored-by: DIY Smart Code <thomas@thirty3.de>
Co-authored-by: Cocoon-Break <54054995+kuishou68@users.noreply.github.com>
Co-authored-by: Kagura <kagura.agent.ai@gmail.com>
Co-authored-by: Claude Opus 4 (1M context) <noreply@anthropic.com>
Co-authored-by: Yasser <116118149+YrFnS@users.noreply.github.com>
Co-authored-by: Truffle <truffleagent@gmail.com>
Co-authored-by: cjnprospa <sirhcle.j23@gmail.com>
prospapledge88 pushed a commit to prospapledge88/Archon that referenced this pull request May 6, 2026
Regression from coleam00#1481 (honor CLAUDE_BIN_PATH in dev mode). The CI workflow
set `CLAUDE_BIN_PATH: ~/.local/bin/claude` in YAML `env:` blocks; YAML does
not expand `~`, so the literal string was passed to the resolver.

Before coleam00#1481, dev mode silently ignored the env var and the SDK
auto-resolved its bundled binary — so the broken value was harmless. After
coleam00#1481, dev mode honors it, the file-existence check fails on the literal
`~`, and the smoke job aborts with "CLAUDE_BIN_PATH is set ... but the file
does not exist".

Move the env-var assignment into the run-step shell where `$HOME` resolves.
Both e2e-claude and e2e-mixed-providers jobs are affected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(claude): dev-mode workflows fail on glibc Linux due to SDK musl-first auto-resolution; honor CLAUDE_BIN_PATH as escape hatch

1 participant